class: inverse, center, middle
The emergence of high-throughput sequencing technologies such as 454 (Roche) and Solexa (Illumina) sequencing allowed for the highly parallel short read sequencing of DNA molecules.
overview
Sequencing typically performed on bulk tissue or cells.
Analysis of the bulk characteristics of data without understanding of hetergeneity of data.
Newer technologies such as TRAP from the Heintz lab or nuclei sorting allow for capture of distinct cell types based on expressed markers.
Pros - Allow for the capture of rare cell populations such as specific neuron types.
Cons - Require known markers for desired cell populations.
With the advent of advanced microfluidics and refined sequencing technologies, single-cell sequencing has emerged as a technology to profile individual cells from a heterogeneous population without prior knowledge of cell populations.
Pros - No prior knowledge of cell populations required. - Simultaneously assess profiles of 1000s of cells.
Cons - Low sequencing sequencing depth for individual cells (1000s vs millions of reads for bulk).
Single-cell sequencing, as with bulk sequencing, has now been applied to the study of a wide range of differing assays.
Many companies offer single-cell sequencing technologies which may be used with the Illumina sequencer.
Two popular major companies offer the most used technologies.
Major difference between the two are the sequencing depth and coverage profiles across transcripts.
overview
overview
overview
overview
Read 1
Read 2
The sequence reads contain:-
As with standard bulk sequencing data, the next steps are typically to align the data to a reference genome/transcriptome and summarize data to a signal matrix.
For the processing of scRNA/snRNA from fastQ to count matrix, there are many options available to us.
Alignment and counting - Cellranger count - STAR - STARsolo - Subread cellCounts
Pseudoalignment and counting - Salmon - Alevin - Kallisto - Bustools
The output of these tools is typically a matrix of the signal attributed to cells and genes (typically read counts).
This matrix is the input for all downstream post-processing, quality control, normalization, batch correction, clustering, dimension reduction and differential expression analysis.
The output matrix is often stored in a compressed format such as:- - MEX (Market Exchange Format) - HDF5 (Hierarchical Data Format)
]
]
This is an example of a directory
produced by Cell Ranger.
Cell Ranger is the typical approach we use to process 10x data. The default setting are pretty good. This is an intensive program, so we will not be running this locally on your laptops. Instead we run it on remote systems , like the HPC.
If you are working with your own data, the data will often be provided as the Cell Ranger output by the Genomics/Bioinformatics teams, like here at Rockefeller University.